摘要 :
Active learning (AL) attempts to maximize a model's performance gain while annotating the fewest samples possible. Deep learning (DL) is greedy for data and requires a large amount of data supply to optimize a massive number of pa...
展开
Active learning (AL) attempts to maximize a model's performance gain while annotating the fewest samples possible. Deep learning (DL) is greedy for data and requires a large amount of data supply to optimize a massive number of parameters if the model is to learn how to extract high-quality features. In recent years, due to the rapid development of internet technology, we have entered an era of information abundance characterized by massive amounts of available data. As a result, DL has attracted significant attention from researchers and has been rapidly developed. Compared with DL, however, researchers have a relatively low interest in AL. This is mainly because before the rise of DL, traditional machine learning requires relatively few labeled samples, meaning that early AL is rarely according the value it deserves. Although DL has made breakthroughs in various fields, most of this success is due to a large number of publicly available annotated datasets. However, the acquisition of a large number of high-quality annotated datasets consumes a lot of manpower, making it unfeasible in fields that require high levels of expertise (such as speech recognition, information extraction, medical images, etc.). Therefore, AL is gradually coming to receive the attention it is due. It is therefore natural to investigate whether AL can be used to reduce the cost of sample annotation while retaining the powerful learning capabilities of DL. As a result of such investigations, deep active learning (DeepAL) has emerged. Although research on this topic is quite abundant, there has not yet been a comprehensive survey of DeepAL-related works; accordingly, this article aims to fill this gap. We provide a formal classificationmethod for the existing work, along with a comprehensive and systematic overview. In addition, we also analyze and summarize the development of DeepAL from an application perspective. Finally, we discuss the confusion and problems associated with DeepAL and provide some possible development directions.
收起
摘要 :
Deep neural network models owe their representational power and high performance in classification tasks to the high number of learnable parameters. Running deep neural network models in limited-resource environments is a problema...
展开
Deep neural network models owe their representational power and high performance in classification tasks to the high number of learnable parameters. Running deep neural network models in limited-resource environments is a problematic task. Models employing conditional computing aim to reduce the computational burden while retaining model performance on par with more complex neural network models. This paper, proposes a new model, Conditional Information Gain Networks as Sparse Mixture of Experts (sMoE-CIGNs). A CIGN model is a neural tree that allows conditionally skipping parts of the tree based on routing mechanisms inserted into the architecture. These routing mechanisms are based on differentiable Information Gain objectives. CIGN groups semantically similar samples in the leaves, enabling simpler classifiers to focus on differentiating between similar classes. This lets the CIGN model attain high classification performances with lighter models. We further improve the basic CIGN model by proposing a sparse mixture of experts model for difficult to classify samples that may get routed to sub-optimal branches. If a sample has routing confidence higher than a specific threshold, the sample may be routed towards multiple child nodes. The classification decision can then be taken as a mixture of these expert decisions. We learn the optimal routing thresholds by Bayesian Optimization over a validation set by minimizing a weighted loss, including the classification accuracy and the number of multiplication and accumulations (MAC). We show the effectiveness of the CIGN models enhanced with the Sparse Mixture of Experts approach with extensive tests on MNIST, Fashion MNIST, CIFAR 100 and UCI-USPS datasets, as well as comparisons with methods from the literature. sMoE-CIGN models can retain high generalization performance, on par with a thick unconditional model while keeping the operation burden at the same level with a much thinner model.(1) (C) 2021 Elsevier Ltd. All rights reserved.
收起
摘要 :
An energy-efficient deep neural network (DNN) accelerator, unified neural processing unit (UNPU), is proposed for mobile deep learning applications. The UNPU can support both convolutional layers (CLs) and recurrent or fully conne...
展开
An energy-efficient deep neural network (DNN) accelerator, unified neural processing unit (UNPU), is proposed for mobile deep learning applications. The UNPU can support both convolutional layers (CLs) and recurrent or fully connected layers (FCLs) to support versatile workload combinations to accelerate various mobile deep learning applications. In addition, the UNPU is the first DNN accelerator ASIC that can support fully variable weight bit precision from 1 to 16 bit. It enables the UNPU to operate on the accuracy-energy optimal point. Moreover, the lookup table (LUT)-based bit-serial processing element (LBPE) in the UNPU achieves the energy consumption reduction compared to the conventional fixed-point multiply-and-accumulate (MAC) array by 23.1%, 27.2%, 41%, and 53.6% for the 16-, 8-, 4-, and 1-bit weight precision, respectively. Besides the energy efficiency improvement, the unified DNN core architecture of the UNPU improves the peak performance for CL by 1.15x compared to the previous work. It makes the UNPU operate on the lower voltage and frequency for the given DNN to increase energy efficiency. The UNPU is implemented in 65-nm CMOS technology and occupies the 4 x4 mm(2) die area. The UNPU can operates from 0.63-to 1.1-V supply voltage with maximum frequency of 200 MHz. The UNPU has peak performance of 345.6 GOPS for 16-bit weight precision and 7372 GOPS for 1-bit weight precision. The wide operating range of UNPU makes the UNPU achieve the power efficiency of 3.08 TOPS/W for 16-bit weight precision and 50.6 TOPS/W for 1-bit weight precision. The functionality of the UNPU is successfully demonstrated on the verification system using ImageNet deep CNN (VGG-16).
收起
摘要 :
Summary While the choice of prior is one of the most critical parts of the Bayesian inference workflow, recent Bayesian deep learning models have often fallen back on vague priors, such as standard Gaussians. In this review, we hi...
展开
Summary While the choice of prior is one of the most critical parts of the Bayesian inference workflow, recent Bayesian deep learning models have often fallen back on vague priors, such as standard Gaussians. In this review, we highlight the importance of prior choices for Bayesian deep learning and present an overview of different priors that have been proposed for (deep) Gaussian processes, variational autoencoders and Bayesian neural networks. We also outline different methods of learning priors for these models from data. We hope to motivate practitioners in Bayesian deep learning to think more carefully about the prior specification for their models and to provide them with some inspiration in this regard.
收起
摘要 :
Microgrid is a new era in the power system and it has more scope of investigation on research. Due to an increase in demand and future expansion of the power system, analyzing the complexities of the network becomes a challenging ...
展开
Microgrid is a new era in the power system and it has more scope of investigation on research. Due to an increase in demand and future expansion of the power system, analyzing the complexities of the network becomes a challenging task. Artificial intelligence plays a vital role in resolving such issues in a microgrid in various aspects. Owing to the rapid growth of periodical update in computational cost reduction, enhanced data analysis-based algorithm artificial intelligence enters into new epoch Artificial Intelligence AI 2.0. Based on such approach, machine learning has been evolved as AI 2.0 initially. Now, it develops branches like deep learning, reinforcement learning, and a combination of both deep reinforcement learning algorithms. These algorithms are precise to attain higher priority in decision-making under a complex network. This paper deals with numerous challenges of the above algorithm to state the significance of AI 2.0 and summarization of their application toward microgrid is useful to analyze the power system.
收起
摘要 :
Deep learning has been the answer to many machine learning problems during the past two decades. However, it comes with two significant constraints: dependency on extensive labeled data and training costs. Transfer learning in dee...
展开
Deep learning has been the answer to many machine learning problems during the past two decades. However, it comes with two significant constraints: dependency on extensive labeled data and training costs. Transfer learning in deep learning, known as Deep Transfer Learning (DTL), attempts to reduce such reliance and costs by reusing obtained knowledge from a source data/task in training on a target data/task. Most applied DTL techniques are network/model-based approaches. These methods reduce the dependency of deep learning models on extensive training data and drastically decrease training costs. Moreover, the training cost reduction makes DTL viable on edge devices with limited resources. Like any new advancement, DTL methods have their own limitations, and a successful transfer depends on specific adjustments and strategies for different scenarios. This paper reviews the concept, definition, and taxonomy of deep transfer learning and well-known methods. It investigates the DTL approaches by reviewing applied DTL techniques in the past five years and a couple of experimental analyses of DTLs to discover the best practice for using DTL in different scenarios. Moreover, the limitations of DTLs (catastrophic forgetting dilemma and overly biased pre-trained models) are discussed, along with possible solutions and research trends.
收起
摘要 :
Smart grids are the developmental trend of power systems and they have attracted much attention all over the world. Due to their complexities, and the uncertainty of the smart grid and high volume of information being collected, a...
展开
Smart grids are the developmental trend of power systems and they have attracted much attention all over the world. Due to their complexities, and the uncertainty of the smart grid and high volume of information being collected, artificial intelligence techniques represent some of the enabling technologies for its future development and success. Owing to the decreasing cost of computing power, the profusion of data, and better algorithms, AI has entered into its new developmental stage and AI 2.0 is developing rapidly. Deep learning (DL), reinforcement learning (RL) and their combination-deep reinforcement learning (DRL) are representative methods and relatively mature methods in the family of AI 2.0. This article introduces the concept and status quo of the above three methods, summarizes their potential for application in smart grids, and provides an overview of the research work on their application in smart grids.
收起
摘要 :
In this paper, we propose a set of algorithms to design signal timing plans via deep reinforcement learning. The core idea of this approach is to set up a deep neural network (DNN) to learn the Q-function of reinforcement learning...
展开
In this paper, we propose a set of algorithms to design signal timing plans via deep reinforcement learning. The core idea of this approach is to set up a deep neural network (DNN) to learn the Q-function of reinforcement learning from the sampled traffic state/control inputs and the corresponding traffic system performance output. Based on the obtained DNN, we can find the appropriate signal timing policies by implicitly modeling the control actions and the change of system states. We explain the possible benefits and implementation tricks of this new approach. The relationships between this new approach and some existing approaches are also carefully discussed.
收起
摘要 :
Reinforcement learning has been implemented to model a task by doing the task repeatedly to get the maximum results based on the reward and punishment policy. It has been implemented in the game and agent-based modelling. In the g...
展开
Reinforcement learning has been implemented to model a task by doing the task repeatedly to get the maximum results based on the reward and punishment policy. It has been implemented in the game and agent-based modelling. In the game, the game agent or Non-Players character can be modelled using several techniques to achieve the goal (e.g. reinforcement learning, deep neural network and Monte Carlo Tree Search). Deep neural networks and Monte Carlo Tree Search, two more sophisticated techniques in reinforcement learning algorithms, assisted the present reinforcement learning in resolving more challenging issues. However, this area has two challenges: the minimum number of data to model and generalization to different environments. Determining a minimum number of data required by the architecture to train the model is quite a cumbersome task to be applied to real- world jobs and situations since it demands substantial data to be explicitly provided and trial-and-error re-configuration. This work proposes a Data-Efficient Reinforcement Learning model by augmenting the data and implementing episodic memory. To illustrate the effectiveness of the proposed model, this research compares it to several models, such as the Deep Q-Network (DQN) with episodic memory to the same model with data augmentation and episodic memory. The model adds to the observations before being stored in the agent's memory, causing the agent to use the same logic and take the same action in comparable situations. The outcome demonstrates that the augmented model can surpass the fundamental model in speed (with an improvement of 50% quicker).
收起
摘要 :
Deep learning techniques have been paramount in the last years, mainly due to their outstanding results in a number of applications, that range from speech recognition to face-based user identification. Despite other techniques em...
展开
Deep learning techniques have been paramount in the last years, mainly due to their outstanding results in a number of applications, that range from speech recognition to face-based user identification. Despite other techniques employed for such purposes, Deep Boltzmann Machines (DBMs) are among the most used ones, which are composed of layers of Restricted Boltzmann Machines stacked on top of each other. In this work, we evaluate the concept of temperature in DBMs, which play a key role in Boltzmann-related distributions, but it has never been considered in this context up to date. Therefore, the main contribution of this paper is to take into account this information, as well as the impact of replacing a standard Sigmoid function by another one and to evaluate their influence in DBMs considering the task of binary image reconstruction. We expect this work can foster future research considering the usage of different temperatures during learning in DBMs.
收起